14 research outputs found

    Black-Box Data-efficient Policy Search for Robotics

    Get PDF
    The most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynamical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. The key idea is to replace the gradient-based optimization algorithm with a parallel, black-box algorithm that takes into account the model uncertainties. We demonstrate the performance of our new algorithm on two standard control benchmark problems (in simulation) and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017; Code at http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP

    Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors

    Get PDF
    International audienceMeta-learning algorithms can accelerate the model-based reinforcement learning (MBRL) algorithms by finding an initial set of parameters for the dynamical model such that the model can be trained to match the actual dynamics of the system with only a few data-points. However, in the real world, a robot might encounter any situation starting from motor failures to finding itself in a rocky terrain where the dynamics of the robot can be significantly different from one another. In this paper, first, we show that when meta-training situations (the prior situations) have such diverse dynamics, using a single set of meta-trained parameters as a starting point still requires a large number of observations from the real system to learn a useful model of the dynamics. Second, we propose an algorithm called FAMLE that mitigates this limitation by meta-training several initial starting points (i.e., initial parameters) for training the model and allows robots to select the most suitable starting point to adapt the model to the current situation with only a few gradient steps. We compare FAMLE to MBRL, MBRL with a meta-trained model with MAML, and model-free policy search algorithm PPO for various simulated and real robotic tasks, and show that FAMLE allows robots to adapt to novel damages in significantly fewer time-steps than the baselines

    Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

    Get PDF
    International audienceThe most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS

    Black-Box Data-efficient Policy Search for Robotics

    Get PDF
    International audienceThe most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynam-ical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. The key idea is to replace the gradient-based optimization algorithm with a parallel, black-box algorithm that takes into account the model uncertainties. We demonstrate the performance of our new algorithm on two standard control benchmark problems (in simulation) and a low-cost robotic manipulator (with a real robot)

    Apprentissage efficace en données pour la robotique à l'aide de simulateurs

    No full text
    As soon as the robots step out in the real and uncertain world, they have to adapt to various unanticipated situations by acquiring new skills as quickly as possible. Unfortunately, on robots, current state-of-the-art reinforcement learning (e.g., deep-reinforcement learning) algorithms require large interaction time to train a new skill. In this thesis, we have explored methods to allow a robot to acquire new skills through trial-and-error within a few minutes of physical interaction. Our primary focus is to incorporate prior knowledge from a simulator with real-world experiences of a robot to achieve rapid learning and adaptation. In our first contribution, we propose a novel model-based policy search algorithm called Multi-DEX that (1) is capable of finding policies in sparse reward scenarios (2) does not impose any constraints on the type of policy or the type of reward function and (3) is as data-efficient as state-of-the-art model-based policy search algorithm in non-sparse reward scenarios. In our second contribution, we propose a repertoire-based online learning algorithm called APROL which allows a robot to adapt to physical damages (e.g., a damaged leg) or environmental perturbations (e.g., terrain conditions) quickly and solve the given task. In this work, we use several repertoires of policies generated in simulation for a subset of possible situations that the robot might face in real-world. During the online learning, the robot automatically figures out the most suitable repertoire to adapt and control the robot. We show that APROL outperforms several baselines including the current state-of-the-art repertoire-based learning algorithm RTE by solving the tasks in much less interaction times than the baselines. In our third contribution, we introduce a gradient-based meta-learning algorithm called FAMLE. FAMLE meta-trains the dynamical model of the robot from simulated data so that the model can be adapted to various unseen situations quickly with the real-world observations. By using FAMLE with a model-predictive control framework, we show that our approach outperforms several model-based and model-free learning algorithms, and solves the given tasks in less interaction time than the baselines.Quand les robots doivent affronter le monde réel, ils doivent s'adapter à diverses situations imprévues en acquérant de nouvelles compétences le plus rapidement possible. Les algorithmes d'apprentissage par renforcement (par exemple, l'apprentissage par renforcement profond) pourraient permettre d’apprendre de telles compétences, mais les algorithmes actuels nécessitent un temps d'interaction trop important. Dans cette thèse, nous avons exploré des méthodes permettant à un robot d'acquérir de nouvelles compétences par essai-erreur en quelques minutes d'interaction physique. Notre objectif principal est de combiner des connaissances acquises sur un simulateur avec les expériences réelles du robot afin d'obtenir un apprentissage et une adaptation rapides. Dans notre première contribution, nous proposons un nouvel algorithme de recherche de politiques basé sur un modèle, appelé Multi-DEX, qui (1) est capable de trouver des politiques dans des scénarios aux récompenses rares, (2) n'impose aucune contrainte sur le type de politique ou le type de fonction de récompense et (3) est aussi efficace en termes de données que l'algorithme de recherche de politiques de l’état de l’art dans des scénarios de récompenses non rares. Dans notre deuxième contribution, nous proposons un algorithme d'apprentissage en ligne basé sur un répertoire, appelé APROL, qui permet à un robot de s'adapter rapidement à des dommages physiques (par exemple, une patte endommagée) ou à des perturbations environnementales (par exemple, les conditions du terrain) et de résoudre la tâche donnée. Nous montrons qu'APROL surpasse plusieurs lignes de base, y compris l'algorithme d'apprentissage par répertoire RTE (Reset Free Trial and Error), en résolvant les tâches en un temps d'interaction beaucoup plus court que les algorithmes avec lesquels nous l’avons comparé. Dans notre troisième contribution, nous présentons un algorithme de méta-apprentissage basé sur les gradients appelé FAMLE. FAMLE permet d'entraîner le modèle dynamique du robot à partir de données simulées afin que le modèle puisse être adapté rapidement à diverses situations invisibles grâce aux observations du monde réel. En utilisant FAMLE pour améliorer un modèle pour la commande prédictive, nous montrons que notre approche surpasse plusieurs algorithmes d'apprentissage basés ou non sur un modèle, et résout les tâches données en moins de temps d'interaction que les algorithmes avec lesquels nous l’avons comparé

    SafeAPT: Safe Simulation-to-Real Robot Learning Using Diverse Policies Learned in Simulation

    No full text
    Publisher Copyright: © 2016 IEEE.The framework of sim-to-real learning, i.e., training policies in simulation and transferring them to real-world systems, is one of the most promising approaches towards data-efficient learning in robotics. However, due to the inevitable reality gap between the simulation and the real world, a policy learned in the simulation may not always generate a safe behaviour on the real robot. As a result, during policy adaptation in the real world, the robot may damage itself or cause harm to its surroundings. In this work, we introduce SafeAPT, a multi-goal robot learning algorithm that leverages a diverse repertoire of policies evolved in simulation and transfers the most promising safe policy to the real robot through episodic interaction. To achieve this, SafeAPT iteratively learns probabilistic reward and safety models from real-world observations using simulated experiences as priors. Then, it performs Bayesian optimization to select the best policy from the repertoire with the reward model, while maintaining the specified safety constraint using the safety model. SafeAPT allows a robot to adapt to a wide range of goals safely with the same repertoire of policies evolved in the simulation. We compare SafeAPT with several baselines, both in simulated and real robotic experiments, and show that SafeAPT finds high-performing policies within a few minutes of real-world operation while minimizing safety violations during the interactions.Peer reviewe

    Adaptive Prior Selection for Repertoire-based Online Adaptation in Robotics

    Get PDF
    Video : http://tiny.cc/aprol_videoInternational audienceRepertoire-based learning is a data-efficient adaptation approach based on a two-step process in which (1) a large and diverse set of policies is learned in simulation, and (2) a planning or learning algorithm chooses the most appropriate policies according to the current situation (e.g., a damaged robot, a new object, etc.). In this paper, we relax the assumption of previous works that a single repertoire is enough for adaptation. Instead, we generate repertoires for many different situations (e.g., with a missing leg, on different floors, etc.) and let our algorithm selects the most useful prior. Our main contribution is an algorithm, APROL (Adaptive Prior selection for Repertoire-based Online Learning) to plan the next action by incorporating these priors when the robot has no information about the current situation. We evaluate APROL on two simulated tasks: (1) pushing unknown objects of various shapes and sizes with a robotic arm and (2) a goal reaching task with a damaged hexapod robot. We compare with "Reset-free Trial and Error" (RTE) and various single repertoire-based base-lines. The results show that APROL solves both the tasks in less interaction time than the baselines. Additionally , we demonstrate APROL on a real, damaged hexapod that quickly learns to pick compensatory policies to reach a goal by avoiding obstacles in the path

    Corbevax vaccine side effects in children of age group 12–14 years: A prospective observational study

    No full text
    Introduction: Corbevax was introduced by the Government of India in March 2022 for the vaccination of children between 12 and 14 years, however, there is a dearth of literature providing side effect profiles of Corbevax in the real-world/community. Hence, this study was conducted to assess the incidence and types of adverse events following immunization (AEFI) of Corbevax vaccine. Materials and Methods: A prospective observational study was conducted among 358 children between 12 and 14 years of age who had received Corbevax vaccine at a tertiary care center in western Maharashtra from March 16 to May 31, 2022. The participants were followed-up telephonically for side effects at 24 h, 72 h, and 7 days following the first dose and second doses of vaccinations. Results: Out of 358 children of age 12–14 years who received Corbevax vaccine, almost 80% of vaccines developed mild AEFI. Overall, reactogenicity was higher after the second dose and the most common AEFI was pain in the abdomen, followed by headache and pain at the site of injection. Occurrence of AEFI after the first dose (odds ratio: 158.87, 95% confidence interval 46.58–528.28, P < 0.005) was found to be a risk factor for the development of AEFI after the second dose. Conclusion: Corbevax was introduced in India for children between 12 and 14 years, but to the best of our knowledge, till date, there is no study specifically focused of AEFI due to Corbevax in the community setting. The study findings indicate that Corbevax is a safe vaccine with few mild side effects, thus reinforcing faith in the safety profile of the vaccine
    corecore